AITopics | metadata extraction

Collaborating Authors

metadata extraction

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Metadata Extraction Leveraging Large Language Models

Han, Cuize, Jalagam, Sesh

arXiv.org Machine LearningOct-23-2025

The advent of Large Language Models has revolutionized tasks across domains, including the automation of legal document analysis, a critical component of modern contract management systems. This paper presents a comprehensive implementation of LLM-enhanced metadata extraction for contract review, focusing on the automatic detection and annotation of salient legal clauses. Leveraging both the publicly available Contract Understanding Atticus Dataset (CUAD) and proprietary contract datasets, our work demonstrates the integration of advanced LLM methodologies with practical applications. We identify three pivotal elements for optimizing metadata extraction: robust text conversion, strategic chunk selection, and advanced LLM-specific techniques, including Chain of Thought (CoT) prompting and structured tool calling. The results from our experiments highlight the substantial improvements in clause identification accuracy and efficiency. Our approach shows promise in reducing the time and cost associated with contract review while maintaining high accuracy in legal clause identification. The results suggest that carefully optimized LLM systems could serve as valuable tools for legal professionals, potentially increasing access to efficient contract review services for organizations of all sizes.

large language model, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

2510.19334

Country:

North America > United States > Illinois > Cook County > Chicago (0.05)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > San Mateo County > Redwood City (0.04)
Asia > China > Liaoning Province > Dalian (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Comparison of Feature Learning Methods for Metadata Extraction from PDF Scholarly Documents

Boukhers, Zeyd, Yang, Cong

arXiv.org Artificial IntelligenceJan-9-2025

The availability of metadata for scientific documents is pivotal in propelling scientific knowledge forward and for adhering to the FAIR principles (i.e. Findability, Accessibility, Interoperability, and Reusability) of research findings. However, the lack of sufficient metadata in published documents, particularly those from smaller and mid-sized publishers, hinders their accessibility. This issue is widespread in some disciplines, such as the German Social Sciences, where publications often employ diverse templates. To address this challenge, our study evaluates various feature learning and prediction methods, including natural language processing (NLP), computer vision (CV), and multimodal approaches, for extracting metadata from documents with high template variance. We aim to improve the accessibility of scientific documents and facilitate their wider use. To support our comparison of these methods, we provide comprehensive experimental results, analyzing their accuracy and efficiency in extracting metadata. Additionally, we provide valuable insights into the strengths and weaknesses of various feature learning and prediction methods, which can guide future research in this field.

extraction, metadata, metadata extraction, (17 more...)

arXiv.org Artificial Intelligence

2501.05082

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Germany > North Rhine-Westphalia > Cologne Region > Cologne (0.04)
Asia > Japan > Honshū > Chūbu > Aichi Prefecture > Nagoya (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Supercharge Content Intelligence with AI

#artificialintelligenceJul-28-2020, 11:09:10 GMT

Artificial intelligence (AI) creates abundant opportunities for a wide range of intelligent, automated business operations. Two vital capabilities--metadata extraction and data enrichment--rank among the most valuable, commonly used functions for businesses seeking to harness immediate value from organizational data and content. AI-driven techniques for rapidly sorting, filtering, categorizing, and adding context to massive volumes of data can help deliver a distinct business advantage. By combining accessible, cloud-based AI services and customizable, specialized AI tools and training, businesses can shape data and content services to better meet their objectives. Despite the accelerating, never-ending spiral of accumulating content, most businesses aren't gaining the insights they need nor seeing visible operational benefits, as asserted in a Software Development Times article.

artificial intelligence, machine learning, natural language, (17 more...)

#artificialintelligence

Country: Europe > Spain > Aragón (0.06)

Industry: Information Technology > Services (0.37)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.40)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.36)

Add feedback

An Agent based Approach towards Metadata Extraction, Modelling and Information Retrieval over the Web

Ahmed, Zeeshan, Gerhard, Detlef

arXiv.org Artificial IntelligenceAug-7-2010

Web development is a challenging research area for its creativity and complexity. The existing raised key challenge in web technology technologic development is the presentation of data in machine read and process able format to take advantage in knowledge based information extraction and maintenance [4]. Currently it is not possible to search and extract optimized results using full text queries because there is no such mechanism exists which can fully extract the semantic from full text queries and then look for particular knowledge based information. Mechanism of presenting information over the web in a format so that the humans as well as machines can understand the context leads to the concept of Semantic Web introduced by Tim Berners Lee [4]. Semantic web is a linked mesh of information to produce technologies capable of reasoning on semi structured information and processed by machines [4].

data mining, information, information retrieval, (15 more...)

arXiv.org Artificial Intelligence

1008.1333

Country: Europe > Austria > Vienna (0.21)

Genre: Research Report (0.40)

Technology:

Information Technology > Communications > Web > Semantic Web (0.62)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.47)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.43)
(2 more...)

Add feedback